Goto

Collaborating Authors

 Lebanon


Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Neural Information Processing Systems

Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational


Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Neural Information Processing Systems

Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational


Collective Memory and Narrative Cohesion: A Computational Study of Palestinian Refugee Oral Histories in Lebanon

Awwad, Ghadeer, Dunagan, Lavinia, Gamba, David, Rayan, Tamara N.

arXiv.org Artificial Intelligence

This study uses the Palestinian Oral History Archive (POHA) to investigate how Palestinian refugee groups in Lebanon sustain a cohesive collective memory of the Nakba through shared narratives. Grounded in Halbwachs' theory of group memory, we employ statistical analysis of pairwise similarity of narratives, focusing on the influence of shared gender and location. We use textual representation and semantic embeddings of narratives to represent the interviews themselves. Our analysis demonstrates that shared origin is a powerful determinant of narrative similarity across thematic keywords, landmarks, and significant figures, as well as in semantic embeddings of the narratives. Meanwhile, shared residence fosters cohesion, with its impact significantly amplified when paired with shared origin. Additionally, women's narratives exhibit heightened thematic cohesion, particularly in recounting experiences of the British occupation, underscoring the gendered dimensions of memory formation. This research deepens the understanding of collective memory in diasporic settings, emphasizing the critical role of oral histories in safeguarding Palestinian identity and resisting erasure.


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task

Khalilia, Mohammed, Malaysha, Sanad, Suwaileh, Reem, Jarrar, Mustafa, Aljabari, Alaa, Elsayed, Tamer, Zitouni, Imed

arXiv.org Artificial Intelligence

This paper presents an overview of the Arabic Natural Language Understanding (ArabicNLU 2024) shared task, focusing on two subtasks: Word Sense Disambiguation (WSD) and Location Mention Disambiguation (LMD). The task aimed to evaluate the ability of automated systems to resolve word ambiguity and identify locations mentioned in Arabic text. We provided participants with novel datasets, including a sense-annotated corpus for WSD, called SALMA with approximately 34k annotated tokens, and the IDRISI-DA dataset with 3,893 annotations and 763 unique location mentions. These are challenging tasks. Out of the 38 registered teams, only three teams participated in the final evaluation phase, with the highest accuracy being 77.8% for WSD and the highest MRR@1 being 95.0% for LMD. The shared task not only facilitated the evaluation and comparison of different techniques, but also provided valuable insights and resources for the continued advancement of Arabic NLU technologies.


A lexicon obtained and validated by a data-driven approach for organic residues valorization in emerging and developing countries

Rakotomalala, Christiane, Paillat, Jean-Marie, Feder, Frédéric, Avadí, Angel, Thuriès, Laurent, Vermeire, Marie-Liesse, Médoc, Jean-Michel, Wassenaar, Tom, Hottelart, Caroline, Kieffer, Lilou, Ndjie, Elisa, Picart, Mathieu, Tchamgoue, Jorel, Tulle, Alvin, Valade, Laurine, Boyer, Annie, Duchamp, Marie-Christine, Roche, Mathieu

arXiv.org Artificial Intelligence

The text mining method presented in this paper was used for annotation of terms related to biological transformation and valorization of organic residues in agriculture in low and middle-income country. Specialized lexicon was obtained through different steps: corpus and extraction of terms, annotation of extracted terms, selection of relevant terms.


All eyes on Israel's response to Iranian drone and missile attacks

BBC News

It could listen to its neighbours in the region and exercise what is known as "strategic patience", holding off from responding in kind and instead continuing to target Iran's proxy allies in the region such as Hezbollah in Lebanon or military supply sites in Syria, as it has been doing for years.


Linking Symptom Inventories using Semantic Textual Similarity

Kennedy, Eamonn, Vadlamani, Shashank, Lindsey, Hannah M, Peterson, Kelly S, OConnor, Kristen Dams, Murray, Kenton, Agarwal, Ronak, Amiri, Houshang H, Andersen, Raeda K, Babikian, Talin, Baron, David A, Bigler, Erin D, Caeyenberghs, Karen, Delano-Wood, Lisa, Disner, Seth G, Dobryakova, Ekaterina, Eapen, Blessen C, Edelstein, Rachel M, Esopenko, Carrie, Genova, Helen M, Geuze, Elbert, Goodrich-Hunsaker, Naomi J, Grafman, Jordan, Haberg, Asta K, Hodges, Cooper B, Hoskinson, Kristen R, Hovenden, Elizabeth S, Irimia, Andrei, Jahanshad, Neda, Jha, Ruchira M, Keleher, Finian, Kenney, Kimbra, Koerte, Inga K, Liebel, Spencer W, Livny, Abigail, Lovstad, Marianne, Martindale, Sarah L, Max, Jeffrey E, Mayer, Andrew R, Meier, Timothy B, Menefee, Deleene S, Mohamed, Abdalla Z, Mondello, Stefania, Monti, Martin M, Morey, Rajendra A, Newcombe, Virginia, Newsome, Mary R, Olsen, Alexander, Pastorek, Nicholas J, Pugh, Mary Jo, Razi, Adeel, Resch, Jacob E, Rowland, Jared A, Russell, Kelly, Ryan, Nicholas P, Scheibel, Randall S, Schmidt, Adam T, Spitz, Gershon, Stephens, Jaclyn A, Tal, Assaf, Talbert, Leah D, Tartaglia, Maria Carmela, Taylor, Brian A, Thomopoulos, Sophia I, Troyanskaya, Maya, Valera, Eve M, van der Horn, Harm Jan, Van Horn, John D, Verma, Ragini, Wade, Benjamin SC, Walker, Willian SC, Ware, Ashley L, Werner, J Kent Jr, Yeates, Keith Owen, Zafonte, Ross D, Zeineh, Michael M, Zielinski, Brandon, Thompson, Paul M, Hillary, Frank G, Tate, David F, Wilde, Elisabeth A, Dennis, Emily L

arXiv.org Artificial Intelligence

An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment.


Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Anagnostidis, Sotiris, Pavllo, Dario, Biggio, Luca, Noci, Lorenzo, Lucchi, Aurelien, Hofmann, Thomas

arXiv.org Artificial Intelligence

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.


Russia's Use Of Iranian Drones Shows Up Domestic Weakness

International Business Times

The use by Russia of Iranian drones in its war against Ukraine makes clear the weaknesses of its domestic industry and Tehran's growing claim on the market for unmanned aircraft, experts say. Washington believes Iran has delivered hundreds of drones, which Ukrainian officials say are now being used in strikes like those launched against cities and energy infrastructure on Monday. So far two models of Iranian drone have been identified in Ukraine's skies, built for two different purposes. One of them, the Shahed 136, is a relatively low-cost "kamikaze drone" that can be programmed to fly automatically to a set of GPS coordinates with a payload of explosives. "It flies quite low, striking a target that must be stationary at a range of a few hundred kilometres," said Pierre Grasser, a researcher tied to Paris' Sorbonne University.